Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems

Identifieur interne : 001E68 ( Main/Exploration ); précédent : 001E67; suivant : 001E69

A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems

Auteurs : Khaled Mostafa [Égypte] ; I. Shaheen [Égypte] ; M. Darwish [Égypte] ; Ibrahim Farag [Égypte]

Source :

RBID : ISTEX:3E3F186B8873FE74B41C4CF4826422234F1E68BA

Descripteurs français

English descriptors

Abstract

Abstract: In this paper, we propose a new approach for detecting and correcting segmentation and recognition errors in Arabic OCR systems. The approach is suitable for both typewritten and handwritten script recognition systems. Error detection is based on rules of the Arabic language and a morphology analyzer. This type of analysis has the advantage of limiting the size of the dictionary to a practical size. Thus, a complete dictionary for roots, which does not exceed 5641 roots, the morphological rules and all valid patterns can be kept in a moderate size file. Recognition channel characteristics are modeled using a set of probabilistic finite state machines. Contextual information is utilized in the form of transitional probabilities between letters of previously defined vocabulary (finite lexicon) and transitional probabilities of garbled text. The developed detection and correction modules have been incorporated as a post-processing phase in an Arabic handwritten cursive script recognition system. Experimental results show a considerable enhancement in performance.

Url:
DOI: 10.1007/978-3-540-48765-4_57


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems</title>
<author>
<name sortKey="Mostafa, Khaled" sort="Mostafa, Khaled" uniqKey="Mostafa K" first="Khaled" last="Mostafa">Khaled Mostafa</name>
</author>
<author>
<name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
</author>
<author>
<name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
</author>
<author>
<name sortKey="Farag, Ibrahim" sort="Farag, Ibrahim" uniqKey="Farag I" first="Ibrahim" last="Farag">Ibrahim Farag</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:3E3F186B8873FE74B41C4CF4826422234F1E68BA</idno>
<date when="1999" year="1999">1999</date>
<idno type="doi">10.1007/978-3-540-48765-4_57</idno>
<idno type="url">https://api.istex.fr/document/3E3F186B8873FE74B41C4CF4826422234F1E68BA/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000546</idno>
<idno type="wicri:Area/Istex/Curation">000539</idno>
<idno type="wicri:Area/Istex/Checkpoint">001419</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Mostafa K:a:novel:approach</idno>
<idno type="wicri:Area/Main/Merge">001F77</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:99-0397539</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000813</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B81</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000799</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Mostafa K:a:novel:approach</idno>
<idno type="wicri:Area/Main/Merge">002180</idno>
<idno type="wicri:Area/Main/Curation">001E68</idno>
<idno type="wicri:Area/Main/Exploration">001E68</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems</title>
<author>
<name sortKey="Mostafa, Khaled" sort="Mostafa, Khaled" uniqKey="Mostafa K" first="Khaled" last="Mostafa">Khaled Mostafa</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Information Technology Department, Faculty of Computers and Information, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Computer Engineering Department, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Égypte</country>
</affiliation>
</author>
<author>
<name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Computer Engineering Department, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Égypte</country>
</affiliation>
</author>
<author>
<name sortKey="Farag, Ibrahim" sort="Farag, Ibrahim" uniqKey="Farag I" first="Ibrahim" last="Farag">Ibrahim Farag</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Institute of Statistical Studies and Research, Cairo University, 12613, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>1999</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">3E3F186B8873FE74B41C4CF4826422234F1E68BA</idno>
<idno type="DOI">10.1007/978-3-540-48765-4_57</idno>
<idno type="ChapterID">57</idno>
<idno type="ChapterID">Chap57</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Arabic</term>
<term>Handwriting recognition</term>
<term>Intelligent system</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Arabe</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance écriture</term>
<term>Système intelligent</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: In this paper, we propose a new approach for detecting and correcting segmentation and recognition errors in Arabic OCR systems. The approach is suitable for both typewritten and handwritten script recognition systems. Error detection is based on rules of the Arabic language and a morphology analyzer. This type of analysis has the advantage of limiting the size of the dictionary to a practical size. Thus, a complete dictionary for roots, which does not exceed 5641 roots, the morphological rules and all valid patterns can be kept in a moderate size file. Recognition channel characteristics are modeled using a set of probabilistic finite state machines. Contextual information is utilized in the form of transitional probabilities between letters of previously defined vocabulary (finite lexicon) and transitional probabilities of garbled text. The developed detection and correction modules have been incorporated as a post-processing phase in an Arabic handwritten cursive script recognition system. Experimental results show a considerable enhancement in performance.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Égypte</li>
</country>
</list>
<tree>
<country name="Égypte">
<noRegion>
<name sortKey="Mostafa, Khaled" sort="Mostafa, Khaled" uniqKey="Mostafa K" first="Khaled" last="Mostafa">Khaled Mostafa</name>
</noRegion>
<name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
<name sortKey="Darwish, M" sort="Darwish, M" uniqKey="Darwish M" first="M." last="Darwish">M. Darwish</name>
<name sortKey="Farag, Ibrahim" sort="Farag, Ibrahim" uniqKey="Farag I" first="Ibrahim" last="Farag">Ibrahim Farag</name>
<name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
<name sortKey="Shaheen, I" sort="Shaheen, I" uniqKey="Shaheen I" first="I." last="Shaheen">I. Shaheen</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001E68 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001E68 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:3E3F186B8873FE74B41C4CF4826422234F1E68BA
   |texte=   A Novel Approach for Detecting and Correcting Segmentation and Recognition Errors in Arabic OCR Systems
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024